Statistics:Introduction/What is Statistics
What is Statistics? Suppose we are biologists, and we want to know the average weight of a zebra. If we were to weigh all zebras and divide the aggregate weight by the number of zebras, we would have the average weight of the population of zebras. That description of a population is called a parameter. But,we rarely have an entire population. We can't measure the weight of every zebra in the world, for example. Even if we could track them all down and weigh each, their weights would change over time, some zebras would die, some new ones would be born, etc. That is, the population of zebras is ever changing. Perhaps a better strategy would be to pick a few zebras (in other words a sample of zebras) and average their weight. We could then use that sample average to estimate the population parameter. Incidentally, a description of a sample is called (drum roll, please) a statistic. If we did calculate statistic to estimate a parameter, what are the odds that our average zebra weight would be a good estimate of the population's average? How many zebras would we need to weigh in order to be reasonably confidant that our sample average is a good estimate of population average? What margin of error should we allow so that the odds of our estimate being a fluke falls below, say, one in 25 or one in 100? Statistics, using results from the mathematical theory of probabability, seeks to answer these questions. Over the last few decades knowlege of statistical analysis is a prerequisite in many academic disciplines, from business to psychology to forestry. Despite this, statistics is one of the most misunderstood disciplines. The layman understands Statistics as the obsessive-compulsive act of accumulating large volumes of numbers—-from baseball scores to everything in the Farmer's Almanac. Journalists and politicans are notorious for their ignorance of, and deliberate abuse of statistical results. Not surprisingly these results are often viewed with skepticism, such as words often attributed to Disraeli, "There are lies, damned lies, and statistics!" However the proper use of statistical analysis affects all of us. Doctors can judge the effectiveness of a treatment. Statistical quality control means that manufacturers don't have to take every widget they make off the production line to be tested. Psychologists have methods of checking if the response of a subject of experiment is a fluke or a behavioural trait. Fund managers can decide whether an investment strategy really beats the market or if they just got lucky. The scientific method relies on statistics. How do we analyse the result of an experiment? Is, for example, a sample average of zebra weights a valid result? How do we design our experiment? Using our zebra example, which zebras should we measure, and how many? Thanks to statistics we can answer these questions properly and save lives and money. Prehistory of Probability and Statistics Modern statistics draws from the study of probability. Some aspects of probability, notably the study of combinatorics, have existed for as long as people have been interested in gambling. In the 17th and 18th centuries, mathematicians began to develop probability as a distinct branch of mathematics, motivated by scientists of that era who were interested dealing with error systematically. When scientists repeat an experiment to check its results, measurements will vary for many reasons. Scientists sought to understand these errors to make the best possible use of their experimental results. Two figures who could be arguably considered the fathers of statistics were Karl Pearson (1857-1936) and R.A. Fisher (1890-1962). Rather lamentably, Karl Pearson - and later his son E.S. Pearson - had a long running and bitter feud with Fisher. Pearson is credited with the introduction of Linear Regression and correlation and the classification of probability distributions-—studying phenomena in nature whose behavior generally fell into one or a few basic mathematical patterns. Fisher worked at the Rothamsted Agricultural Research Station where he pioneered the design of experiments and established the concepts that would become the foundation of modern statistics. This material has been imported fom the wikibook "Statistics"[ http://en.wikibooks.org/wiki/Statistics]under the GNU Free Documentation License.